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Abstract 


The  Earth  Mover’s  Distance  (EMD)  between  two  finite  distributions  of  weight  is  proportional 
to  the  minimum  amount  of  work  required  to  transform  one  distribution  into  the  other.  Cur¬ 
rent  content-based  retrieval  work  in  the  Stanford  Vision  Laboratory  uses  the  EMD  as  a 
common  framework  for  measuring  image  similarity  with  respect  to  color,  texture,  and  shape 
content.  In  this  report,  we  present  some  fast  to  compute  lower  bounds  on  the  EMD  which 
may  allow  a  system  to  avoid  exact,  more  expensive  EMD  computations  during  query  pro¬ 
cessing.  The  effectiveness  of  the  lower  bounds  is  tested  in  a  color-based  retrieval  system.  In 
addition  to  the  lower  bound  work,  we  also  show  how  to  compute  the  EMD  under  translation. 
In  this  problem,  the  points  in  one  distribution  are  free  to  translate,  and  the  goal  is  to  find  a 
translation  that  minimizes  the  EMD  to  the  other  distribution. 
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1  Introduction 


Recent  image-based  retrieval  work  ([11,  12])  in  the  Stanford  Vision  Laboratory  (SVL)  has  con¬ 
centrated  on  providing  a  common  framework  for  measuring  image  similarity  with  respect  to  color, 
texture,  and  shape  content.  In  this  framework,  the  summary  or  signature  of  an  image  is  a  finite 
collection  of  weighted  points.  For  example,  in  [11]  the  color  content  signature  of  an  image  is  a 
collection  of  dominant  image  colors  represented  in  the  CIE-Lab  space,  where  each  color  is  weighted 
by  the  fraction  of  image  pixels  classified  as  that  color.  In  [12],  the  texture  content  signature  of 
a  single  texture  image  is  a  collection  of  dominant  spatial  frequencies,  where  each  frequency  is 
weighted  by  the  amount  of  energy  at  that  frequency.  In  current  shape-based  retrieval  work,  the 
shape  content  signature  of  an  image  is  a  collection  of  points  in  parameter  spaces  of  basic  shapes 
(such  as  line  segments  and  circular  arcs)  which  fit  well  into  image  edges,  where  each  basic  shape 
occurrence  is  weighted  by  its  length.  To  complete  the  uniform  framework,  a  distance  measure  on 
weight  distributions  is  needed  to  measure  similarity  between  image  signatures. 

The  Earth  Mover’s  Distance  (EMD)  between  two  distributions  is  proportional  to  the  minimum 
amount  of  work  required  to  transform  one  distribution  into  the  other.  Here  one  unit  of  work  is 
defined  as  the  amount  of  work  necessary  to  move  one  unit  of  weight  by  one  unit  of  distance.  The 
transformation  process  can  be  visualized  as  filling  holes  with  piles  of  dirt.  The  holes  are  located 
at  the  points  in  the  lighter  distribution,  and  the  dirt  piles  are  located  at  the  points  in  the  heavier 
distribution.  The  volume  of  a  hole  or  dirt  pile  is  given  by  the  weight  value  of  its  position.  If  the 
total  weights  of  the  distributions  are  equal,  then  all  the  dirt  is  used  to  fill  the  holes.  Otherwise, 
there  will  be  dirt  leftover  after  all  the  holes  have  been  completely  filled.  The  EMD  is  defined 
to  be  the  minimum  amount  of  work  to  fill  the  holes  divided  by  the  total  weight  of  the  lighter 
distribution.  Normalizing  by  the  amount  of  dirt  moved  means  the  EMD  will  not  change  if  the 
weights  of  both  distributions  are  multiplied  by  a  constant.  The  EMD  is  a  metric  when  the  total 
weights  of  the  distributions  are  equal  and  the  “ground  distance”  between  holes  and  dirt  piles  is  a 
metric  ([12]).  There  is  a  very  efficient  method  for  computing  the  EMD  which  is  based  on  a  solution 
to  the  well-known  transportation  problem  ([4])  in  operations  research. 

In  current  SVL  content-based  retrieval  systems,  the  distance  between  two  images  is  taken  as 
the  EMD  between  the  two  corresponding  signatures.  The  query  time  is  dominated  by  the  time 
to  perform  the  EMD  computations.  Two  common  types  of  queries  are  nearest  neighbor  queries 
and  range  queries.  In  a  nearest  neighbor  query,  the  system  returns  the  K  database  images  which 
are  closest  to  the  given  query.  In  a  range  query,  the  system  returns  all  database  images  which  are 
within  some  distance  r  of  the  query.  For  both  query  types,  fast  lower  bounds  on  the  EMD  may 
decrease  the  query  time  by  avoiding  slower,  exact  EMD  computations.  During  nearest  neighbor 
query  processing,  an  exact  EMD  computation  need  not  be  performed  if  there  is  a  lower  bound 
on  the  EMD  which  is  greater  than  the  Kth  smallest  distance  seen  so  far.  During  range  query 
processing,  an  exact  EMD  computation  need  not  be  performed  if  there  is  a  lower  bound  on  the 
EMD  which  is  greater  than  r.  Of  course,  whether  or  not  the  query  time  decreases  when  a  lower 
bound  is  used  depends  upon  the  number  of  exact  EMD  computations  avoided  and  the  computation 
times  for  the  exact  EMD  and  the  lower  bound. 

It  is  known  ([12])  that  the  distance  between  the  centroids  of  two  equal-weight  distributions  is 
a  lower  bound  on  the  EMD  between  the  distributions.  There  are,  however,  common  situations 
in  which  distributions  will  have  unequal  weights.  For  example,  consider  the  color-based  retrieval 
work  [11]  in  which  the  weight  of  a  dominant  image  color  is  equal  to  the  fraction  of  pixels  classified 
as  that  color.  Assuming  all  the  pixels  in  an  image  are  classified,  the  weight  of  every  database 
signature  is  one.  EMD  comparisons  between  unequal- weight  distributions  arise  whenever  the  system 
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is  presented  with  a  partial  query  such  as:  ’’give  me  all  images  with  at  least  20%  sky  blue  and  30% 
green”.  The  query  signature  consists  of  two  points  in  CIE-Lab  space  with  weights  equal  to  0.20 
and  0.30,  and  therefore  has  total  weight  equal  to  0.50.  In  the  texture  world,  it  seems  difficult 
to  accurately  classify  every  pixel  in  an  image  as  one  of  a  handful  of  dominant  image  textures. 
In  this  case,  using  the  fraction  of  classified  pixels  as  weight  means  that  image  distributions  will 
have  different  weights.  Of  course,  partial  texture  queries  such  as  ’’give  me  all  the  images  with  at 
least  30%  sand  and  30%  sky”  also  imply  comparisons  between  distributions  of  unequal  weight.  In 
our  current  shape-based  retrieval  work,  the  weight  of  a  basic  shape  that  occurs  in  an  image  or 
illustration  is  equal  to  its  length.  Using  length  as  weight,  two  image  shape  distributions  are  very 
likely  to  have  different  total  weights.  In  all  three  cases,  the  total  weight  of  a  distribution  is  equal 
to  the  amount  of  information  present  in  the  underlying  image.  Since  one  cannot  assume  that  all 
database  images  and  queries  will  contain  the  same  amount  of  information,  lower  bounds  on  the 
EMD  between  unequal-weight  distributions  may  be  quite  useful  in  retrieval  systems. 

The  first  part  of  this  report  is  dedicated  to  lower  bounds  on  the  EMD,  and  is  organized  as 
follows.  In  section  2,  we  give  some  basic  definitions  and  notations  that  will  be  used  thoughout  the 
report.  This  section  includes  a  formal  definition  of  the  Earth  Mover’s  Distance.  In  section  3,  we 
prove  the  centroid-distance  lower  bound  for  equal-weight  distributions  (section  3.1),  and  then  we 
extend  the  idea  behind  this  lower  bound  to  obtain  a  centroid-based  lower  bound  between  unequal- 
weight  distributions  (section  3.2).  In  section  4,  we  present  lower  bounds  which  use  projections 
of  distribution  points  onto  random  lines  through  the  origin  and  along  the  directions  of  the  axes. 
These  “projection-based”  lower  bounds  involve  the  EMD  between  distributions  on  the  real  line, 
which  is  the  subject  of  section  5.  For  one-dimensional  distributions,  we  provide  very  efficient 
algorithms  to  compute  (1)  the  EMD  between  equal-weight  distributions  and  (2)  a  lower  bound  on 
the  EMD  between  unequal-weight  distributions.  Both  these  algorithms  use  a  single  sweep  over  the 
distribution  points.  Furthermore,  the  lower  bound  for  unequal  weight  case  gives  the  exact  EMD 
when  applied  in  the  equal  weight  case.  In  combination  with  the  projection-based  lower  bounds  in 
section  4,  the  exact  and  lower  bound  computations  in  one-dimension  yield  fast  to  compute  lower 
bounds  in  general  dimensions  for  both  the  equal  and  unequal-weight  inputs.  In  section  6,  we  show 
some  experiments  that  use  our  lower  bounds  in  the  previously  mentioned  color-based  image  retrieval 
system. 

Another  potentially  useful  area  of  exploration  is  computing  the  EMD  under  some  given  trans¬ 
formation  group,  such  as  the  group  of  translations.  In  this  problem,  the  points  in  one  distribution 
can  be  transformed,  and  the  goal  is  to  find  a  transformation  that  minimizes  the  EMD  to  the  other 
distribution.  An  application  is  shape-based  retrieval,  where  visual  similarity  may  not  be  captured 
by  a  direct  comparison  of  the  shapes  present  in  two  images  due  to  differences  in  scale,  orienta¬ 
tion,  and/or  position.  In  the  second  part  of  this  report,  we  consider  the  problem  of  computing 
the  EMD  under  translation.  In  section  7,  we  give  both  a  direct  algorithm  (section  7.1)  and  an 
iterative  algorithm  (section  7.2)  for  this  problem.  The  direct  algorithm  is  conceptually  simple  and 
is  guaranteed  to  find  a  globally  optimal  translation,  but  it  is  not  practical  because  it  requires  an 
unreasonable  amount  of  time.  The  iterative  method  is  efficient,  but  it  may  find  only  a  locally 
optimal  translation.  Nonetheless,  it  may  find  a  globally  optimal  translation  if  the  iteration  is  run 
with  a  few  different  initial  translations.  Both  algorithms  require  a  subroutine  that  computes  a 
point  which  minimizes  the  sum  of  weighted  distances  to  a  given  set  of  points.  This  problem  is  the 
subject  of  section  8  where  we  give  solutions  when  the  distance  function  is  the  L2-distance  squared 
(section  8.1),  the  Li-distance  (section  8.2),  and  the  Euclidean  i^-distance  (section  8.3).  Finally,  in 
section  9,  we  give  some  concluding  remarks  on  both  EMD  lower  bounds  and  computing  the  EMD 
under  a  transformation  group. 
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Note  that  the  results  presented  in  this  report  may  still  be  very  useful  if  one  is  interested  in  only 
the  minimum  work  instead  of  the  EMD,  or  one  wants  to  use  a  different  normalization  factor  than  the 
weight  of  the  lighter  distribution.  Statements  about  the  EMD  may  be  transformed  into  statements 
about  the  minimum  work  by  multiplying  through  by  the  smaller  weight.  In  fact,  our  reasoning 
about  the  EMD  usually  proceeds  by  reasoning  about  the  work  and  dividing  by  the  appropriate 
constant  in  the  last  step. 


2  Basic  Definitions  and  Notations 


We  denote  a  finite  distribution  x  as 

x  =  {  {xi,w1),(x2,w2),...,{xn,wn)  }  =  ( X,w )  G  Dd,n 


where 

X  =  [x\  xn  ]  €  Rdxn  and  w  >  0. 

Here  d  is  the  dimension  of  the  points  Xi  €  Rd,  and  n  is  the  number  of  points.  For  a  vector  v,  let 
us  be  the  sum  of  the  components  of  v.  The  (total)  weight  of  the  distribution  x  is 

n 

wx  =  Y.wr 

3= 1 


Given  two  distributions  x  =  (X,w)  G  Dd’m  and  y  =  ( Y,u )  G  Dd’n,  a  flow  between  x  and  y  is  any 
matrix  F  =  (fij)  €  Rmxn.  Intuitively,  fij  represents  the  amount  of  weight  at  xt  which  is  matched 
to  weight  at  y3 .  An  equally  valid  interpretation  for  flj  is  the  amount  of  weight  at  tjj  which  is 
matched  to  weight  at  xt.  The  term  flow  is  meant  to  evoke  the  image  of  weight  flowing  from  the 
points  in  the  heavier  distribution  to  the  points  in  the  lighter  distribution  until  all  the  weight  in  the 
lighter  distribution  has  been  covered.  If  one  distribution  is  known  to  be  heavier  than  the  other, 
then  we  shall  write  that  a  flow  is  from,  the  heavier  distribution  to  the  lighter  distribution.  The  flow 
F  is  a  feasible  flow  between  x  and  y  iff 


fij 

> 

o 

c* . 

II 

h— 1 

71 

< 

Wi  i  =  1, 

J=1 

771 

Y.f‘> 

< 

Uj  j  =  1. 

2=1 

771  71 

min(ms,«E) 

2  =  1  j  =  1 


(1) 

(2) 

(3) 

(4) 


Constraint  (1)  requires  the  amount  of  Xi  matched  to  yj  to  be  non-negative.  Constraint  (2)  ensures 
that  the  weight  in  y  matched  to  x%  does  not  exceed  Wi.  Similarly,  (3)  ensures  that  the  weight  in  x 
matched  to  y3  does  not  exceed  Uj.  Finally,  constraint  (4)  forces  the  total  amount  of  weight  matched 
to  be  equal  to  the  weight  of  the  lighter  distribution. 

Let  F(x,y)  denote  the  set  of  all  feasible  flows  between  x  and  y.  The  work  done  by  a  feasible 
flow  F  G  T{x.  y)  in  matching  x  and  y  is  given  by 


m  n 


WORK  (F,x,y)  =  Y,Y/fiAi, 

i=lj=l 
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where 


dij  —  d{x<i,yj) 

is  the  distance  between  X{  and  yj.  Throughout  most  of  this  report  we  shall  use  the  Euclidean 
distance  d(xi^yj)  =  || X{  —  yj\\2  as  the  ground  distance  d,  and  this  choice  should  be  assumed  unless 
otherwise  indicated.  The  Earth  Mover’s  Distance  EMD(:r,y)  between  x  and  y  is  the  minimum 
amount  of  work  to  match  x  and  y,  normalized  by  the  weight  of  the  lighter  distribution: 


EMD  (a;,  y) 


minjr=(/..)g^(:C;j/)  i  g|=i  fjjdy 
min  (wx,uz) 


m'mF={fij)e:F(Xjy)  WORK{F,x,y) 
min(iOE,us) 


(5) 


The  work  minimization  problem  in  the  numerator  of  (5)  is  a  linear  program,  and  hence  can  be 
solved  by  applying  the  simplex  algorithm  ([10]).  Applying  the  simplex  method  instead  to  the  dual 
linear  program  results  in  an  increasing  sequence  of  objective  function  values,  each  of  which  is  a 
lower  bound  on  the  EMD.  In  contrast,  all  lower  bounds  presented  in  this  report  are  independent 
of  the  algorithm  used  to  compute  the  exact  EMD. 


3  Centroid-based  Lower  Bounds 

The  centroid  x  of  the  distribution  x  =  ( X ,  w)  G  Drf,n  is  defined  as 

_  £”=  1  WjXj 
x  =  — - - . 

In  section  3.1  we  shall  prove  that  the  distance  between  the  centroids  of  distributions  is  a  lower 
bound  on  the  EMD  between  distributions  of  equal  weight.  There  is  also,  however,  a  centroid-based 
lower  bound  if  the  distributions  are  not  equal  weight.  If  x  =  (X,w)  is  heavier  than  y  =  (Y,u), 
then  all  of  the  weight  in  y  is  matched  to  part  of  the  weight  in  x.  The  weight  in  x  which  is  matched 
to  y  by  an  optimal  flow  is  a  sub-distribution  x'  of  x.  Formally,  a  sub-distribution  x'  =  (X',w')  of 
x  =  ( X ,  w )  €  Dd,ra,  denoted  x'  C  x,  is  a  distribution  with  X'  =  X  and  0  <w'<w: 

x'  =  {  (xi,wi), . .  .,(xn,w'n)  }  =  (X,wf)  €  Dd,n,  0  <  Wj  <  Wj  for  j  =  1,...  ,n. 

In  words,  the  points  of  a  sub-distribution  x'  are  the  same  as  the  points  of  x  and  the  weights  of 
x'  are  bounded  by  the  weights  of  x.  One  can  visualize  a  sub-distribution  x'  C  x  as  the  result  of 
removing  some  of  the  dirt  in  the  piles  of  dirt  in  x.  The  minimum  distance  between  the  centroid 
of  y  and  the  locus  of  the  centroid  of  sub-distributions  of  x  of  total  weight  uy.  is  a  lower  bound  on 
EMD(.'i;.  y).  Details  are  given  in  section  3.2. 

3.1  Distributions  of  Equal  Weight 

Theorem  1  Suppose  x  =  (X,  w )  G  Drf,m  and  y  =  (Y,  u)  G  Drf,n  are  distributions  of  equal  total 
weight  wz  =  u 2 .  Then 

EMDlHI(z,y)  >  ||x— y||. 

Here  the  ground  distance  ||  •  ||  is  any  Lp  norm  used  to  measure  d(x.i:yj). 

Proof  The  equal  weight  requirement  implies  that  for  any  feasible  flow  F  =  ( fy ), 


m 

E  A?  =  U3  and 

i=  1 

n 

(6) 

Ea?  =  wi- 

(7) 

3= 1 
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Then 


m 

n 

1  m  77  m 

n  ! 

Y^WiXi 

-  E  um 

= 

E  E  fijXi  EZ  E  A??/? 

1=1 

3= 1 

== 

i=l  j=l  7=1  j 

m  n 

y  y  (xi _  yj ) 

7=1  J  =  1 

m  7i 

=1  II 

((6),  (7)) 


E  WiXi  -  J2  ujVj 

i=  1  j= 1 


<  EEn/»  («i  -  Vj)\\  ( A-inequality) 

i=l  j= 1 
m  n 

~  y y /^ii^ ~~ y.?n  (a?  — o) 

7=1  J  =  1 
77i  n 

<  EE-foll**  -yjll- 

7=1  j=l 


Dividing  both  sides  of  the  last  inequality  by  ins  =  yields 

YaL  1  ]Cj=l  Ajlfoi  ~  2/ j  1 1 


l®-y||  < 


w-z 


for  any  feasible  flow  F.  Replacing  F  by  a  work  minimizing  flow  gives  the  desired  result.  Note  that 
this  proof  holds  for  every  Lp  distance/norm  ||  •  ||.  ■ 


3.2  Distributions  of  Unequal  Weight 

Let  x  —  ( X ,  w)  E  D d,m  and  y  =  (Y,  u)  E  Dd,n  be  distributions  with  ins  >  us-  In  any  feasible  flow 
F  —  ( fij )  from  x  to  y,  all  of  the  weight  Uj  must  be  matched  to  weight  in  x 

m 

y  ~  uj-> 

i— 1 


and  the  total  amount  of  matched  weight  is 


Let 


EE/ii = 

7=1  J  =  1 


^  ”  {  O^l?  A?  )’  (^27  /2j)l  *  *  *  5  (^7715  fmj)  }  —  (-^5  W  )• 

i=i  j=i  i=i 


Clearly,  w£  =  us-  From  the  previous  section  we  know  that 


It  follows  that 


EMD(xf,«/)  > 

EMD(sf,j/)  >  min 

~  F'eF(x,y) 


xF  -y 


x*  -y 


(8) 
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where  the  minimum  is  taken  over  all  feasible  flows  Ff  from  x  to  y.  Since  (8)  holds  for  every  feasible 
flow  F  from  x  to  y,  we  can  replace  F  by  a  work  minimizing  flow  F*  and  obtain 


EMD(#,y)  =  EMD(:rF  ,  y)  >  min 

F'eT(x,y) 


(9) 


The  minimum  on  the  right-hand  side  of  the  inequality  (9)  can  be  re-stated  as  the  minimum  distance 
of  the  centroid  of  y  to  the  centroid  of  any  sub-distribution  of  x  of  total  weight 


min 

F'eH^y) 


min 

x'  =  (X,  w ')  C  x 

4  =  ut, 


(10) 


Clearly,  xF'  is  a  sub-distribution  of  x  with  total  weight  ue  for  every  F'  G  F(x,y).  It  remains  to 
argue  that  any  sub-distribution  x'  C  x  with  total  weight  us  is  xF  for  some  F'  G  T (x,  y) .  Since  x' 
and  y  are  equal-weight  distributions,  any  one-to-one  matching  of  the  weights  in  x'  and  y  defines  a 
feasible  flow  between  x'  and  y  and,  therefore,  between  x  and  y.  Combining  (9)  and  (10), 


EMD(i,  j/)  >  min 

x'  =  (X,  w')  C  x 
w'E  = 


(11) 


In  section  3.2.1  we  show  how  this  minimization  problem  can  be  formulated  as  the  minimization  of 
a  quadratic  function  subject  to  linear  constraints.  However,  solving  this  quadratic  programming 
problem  is  likely  to  take  more  time  than  computing  the  EMD  itself.  In  section  3.2.2  we  show  how 
to  compute  a  bounding  box  for  the  locus  of  the  centroid  of  any  sub-distribution  of  x  of  total  weight 
ue-  The  minimum  distance  from  the  centroid  of  y  to  the  bounding  box  is  a  lower  bound  of  the 
EMD,  although  it  is  obviously  not  as  tight  as  the  lower  bound  in  (11). 


3.2.1  The  Centroid  Lower  Bound 

Given  a  distribution  x  =  (X,  w)  G  Dd,m,  the  locus  of  the  centroid  of  sub-distributions  of  x  of  weight 
cra>s,  0  <  a  <  1,  is 


Cl 


■w  =  { 


Efei  Vi 


:  0  <  Wi  <  wu  0  <  wj:  =  awy. 


Let 


Then 


Wi 


Vi  = 


and  Wi  — 


Wi 


Wz 


aw?, 


1  w 


Ca(x)  =  <  ViXi  :  0  <  v  <  w  = - ,  —  l  >  , 

I  t—'  rv 


i— 1 


or,  in  terms  of  matrix  multiplication, 


a  w% 


Ca(x)  =  {  Xv  :  0  <  v  <  w  =  —  lTv  =  1  }. 

a  wz 


(12) 


The  symbol  “1”  is  overloaded  in  the  constraint  \Tv  —  1;  on  the  left-hand  side  it  is  a  vector  of  m 
ones,  while  on  the  right-hand  side  it  is  simply  the  integer  one.  It  is  easy  to  see  from  (12)  that 


Cai(x)  D  Ca2(x)  if  a\  <  c*2- 
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The  locus  Ca( x)  is  a  convex  polytope.  The  intersection  of  the  halfspaces  v  >  0  and  v  <  w  is  a 
convex  polytope  Pi .  The  intersection  of  Pi  with  the  hyperplane  lTv  =  1  is  another  convex  polytope 
P2  of  one  dimension  less.  Finally,  applying  the  linear  map  X  to  P2  gives  the  convex  polytope  Ca(x). 
In  [1],  the  authors  characterize  and  provide  algorithms  to  compute  the  locus  Cl,h{S)  of  the  centroid 
of  a  set  S  of  points  with  approximate  weights,  where  weight  w,  lies  in  a  given  interval  [k,  hi]  and 
the  total  weight  W  is  bounded  as  L  <  W  <  H.  The  locus  Ca(x)  =  Ci^(X)  if  [k,  hn]  —  [0,  ibj]. 

Now  suppose  that  y  =  (Y,  u)  G  Dd,n  is  a  lighter  distribution  than  x.  In  the  previous  section  we 
argued  that  the  EMD"  is  bounded  below  by  the  minimum  distance  from  y  to  a  point  in  CUs/w*(x). 
We  denote  this  minimum  distance  as  CLOC(x,  y)  because  it  uses  the  locus  of  the  centroid  of  sub¬ 
distributions  of  x  of  weight  This  lower  bound  can  be  computed  by  minimizing  a  quadratic 
objective  function  subject  to  linear  constraints: 

(CLOC(a;, y))2  =  min  \\Xv-y\\l 

subject  to 

v  >  0 

1 

V  <  w  =  — w 
U-£ 

lTv  =  1. 

The  above  minimization  problem  consists  of  m  variables  and  2 m  +  1  linear  constraints  which  are 
taken  directly  from  (12). 


3.2.2  The  Centroid  Bounding  Box  Lower  Bound 


As  previously  mentioned,  the  computation  of  the  CLOC  lower  bound  as  described  in  the  previous 
section  is  likely  to  require  more  time  than  an  exact  EMD  computation.  Yet  the  centroid  locus 
Ca(x)  can  still  be  very  useful  in  finding  a  fast  to  compute  lower  bound  on  the  EMD.  The  idea 
is  to  precompute  a  bounding  box  Ba(x)  for  Ca(x)  for  a  sample  of  a  values,  say  a  =  0.05A:  for 
k  =  1, . . . ,  20.  When  given  a  lighter  query  distribution  y  at  query  time,  the  minimum  distance  from 
y  to  the  bounding  box  Bay(x)  is  a  lower  bound  on  EMD(.t,  y),  where  ay  is  the  largest  sample  a 
value  which  does  not  exceed  the  total  weight  ratio  u^/w^  (the  correctness  of  ay  follows  from  the 
containment  property  (14)).  This  lower  bound  computation  will  be  very  fast  because  the  bounding 
boxes  are  precomputed  and  the  query  time  computation  of  the  minimum  distance  of  the  point  y  to 
the  box  Bav  (x)  is  a  constant  time  operation  (it  depends  only  on  the  dimension  d,  not  the  number 
of  points  in  a;  or  y). 

If  we  write  the  matrix  X  in  terms  of  its  rows  as 


then 


X  = 


L  <d  -I 


e  R' 


dxm 


r  YT 


Xv  = 


r{  v 


nv 


G  Rr 
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The  computation  of  an  axis-aligned  bounding  box  for  the  centroid  locus  Ca(x)  can  be  accomplished 
by  solving  the  2d  linear  programs 


ak  —  mm  r^v,  =  max  rjv  k  =  1, . . . , d 


subject  to 


v  >  0 

v  <  w  =  — — w  (13) 

otw% 

lT  V  =  1. 

Each  of  these  linear  programs  has  m  variables  and  2m  +  1  constraints.  The  axis-aligned  bounding 
box  for  the  centroid  locus  Ca(x)  is 

d 

Ba(x)  =  IlkM 

k=l 

As  with  the  true  centroid  loci  Ca(x ),  we  have  a  containment  property  for  the  bounding  boxes 
Ba{x): 

Bai{x)  D  Ba2{x)  if  ai  <  a2.  (14) 

This  fact  can  be  verified  by  observing  that  the  constraints  over  which  the  minima  a *  and  maxima 
bk  are  computed  get  weaker  as  a  decreases  (the  only  constraint  involving  a  is  (13)).  Note  also  that 
the  box  Ba(x)  includes  its  “interior”  so  that  the  lower  bound  CBOX(a;,y)  is  zero  if  y  lies  “inside” 
Bay(x).  Using  the  CBOX  lower  bound  instead  of  the  CLOC  lower  bound  trades  off  computation 
speed  for  pruning  power  since  the  former  is  much  faster  to  compute,  but 

EMD(£,y)  >  CLOC(^,y)  >  CBO X(®,y). 

Nevertheless,  the  pruning  power  of  the  CBOX  lower  bound  will  be  high  when  the  query  distribution 
is  well-separated  from  many  of  the  database  distributions  (which  implies  that  the  centroids  will 
also  be  well-separated). 

4  Project  ion- based  Lower  Bounds 

For  v  on  the  unit  sphere  Sd~l  in  Rd,  the  projection  pr()j,,(.x)  of  the  distribution  x  =  {X.  w)  £  Rd>TO 
along  the  direction  v  is  defined  as 


proj„(a;)  =  {  {vTxi,  wx),  {vTx2,  w2), . . . ,  (vTxm,  wm)  }  =  (vTX,w)  £  D1,m. 

In  words,  the  projection  along  v  is  obtained  by  using  the  lengths  of  the  projections  of  the  distribution 
points  along  v  and  leaving  the  corresponding  weights  unchanged.  The  following  lemma  shows  that 
the  EMD  between  projections  is  a  lower  bound  on  the  EMD  between  the  original  distributions. 

Lemma  1  Let  v  £  Sd~l.  Then 

EMD(a;,y)  >  EMD(proj„(a:),projJy)). 
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Proof  This  theorem  follows  easily  from  the  definition  of  the  EMD  and  the  fact  that 


\vTxi-vTyj\ 


\vTXi  -  vTyj I 


=  \vT{xi-yj)  | 

=  ||u||2  \\xi  2/j  1 1 2  |  COS  @Vt(xi-yj)\ 

—  | 2/j 1 12  |  CQsOV)(Xi—  y^\ 

<  ||®i  — 


The  following  theorem  is  an  immediate  consequence  of  Lemma  1. 
Theorem  2  Let  V  =  {iq, . . .  ,vl}  C  Sd~ 1  and 

PMAX(y,  x, y)  =  max  EMD(projl)(x),proj„(y)) 


Then 

EMD{x,y)  >  PMA X(V,x,y). 

For  this  lower  bound  to  be  of  practical  use,  we  must  be  able  to  compute  it  efficiently.  In  section  5, 
we  present  a  straightforward,  @(m  +  n)  time  algorithm  to  compute  the  EMD  between  equal-weight 
distributions  on  the  line.  In  combination  with  Theorem  2,  this  algorithm  provides  the  means  to 
compute  quickly  a  lower  bound  on  the  EMD  between  two  equal-weight  distributions. 

One  pruning  strategy  is  to  pick  a  set  of  random  directions  V  along  which  to  perform  projections, 
and  apply  Theorem  2  to  obtain  a  lower  bound.  The  hope  is  that  the  differences  between  two 
distributions  will  be  captured  by  looking  along  one  of  the  directions  in  V .  Another  pruning  strategy 
is  to  use  the  set  of  orthogonal  axis  directions  for  the  set  V.  The  following  corollary  is  an  immediate 
consequence  of  Theorem  2. 

Corollary  1  Let 

E  =  {e1,...,ed}cSd~1 

be  the  set  of  axis  directions,  and  let 

PAMAX(rr,  y)  =  PMAX(E,x,y). 


Then 

EMD(i,  y)  >  PAMAX(x,  y). 

Looking  along  the  space  axes  is  intuitively  appealing  when  each  axis  measures  a  specific  property. 
For  example,  suppose  that  distribution  points  are  points  in  the  CIE-Lab  color  space  ([16]).  If  two 
images  are  very  different  in  terms  of  the  luminance  values  of  pixels,  then  comparing  the  signature 
projections  along  the  L-axis  will  reveal  this  difference  and  allow  the  system  to  avoid  an  exact  EMD 
computation. 

When  the  projection  directions  are  the  coordinate  axes,  we  can  prove  a  lower  bound  which 
involves  the  sum  of  the  EMDs  along  axis  directions. 

Theorem  3  If 


then 


1  d 

PASUM(x,  y)  =  — EMD(projejb(x),proje/t(y)), 


EMD(x,y)  >  PASUM(ar,  y). 
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Proof  The  proof  uses  the  fact  that 


IMk  > 

for  any  vector  a  G  Rd,  a  proof  of  which  may  be  found  in  appendix  I.  It  follows  that 

m  n  m  n 

S  A?  1 \x*  ”  Vi  I  i2  —  ~/=5  WXi  ~  Vj  1 1 1 

va  <=i  j=i 


»=1 j=l 


-j  m  n  d 

fij  E 

yai=ij=i  k= i 


d  m  n 


ik=lt=l  j=l 
d  m  n 

V«  /c=l  «=1  7 


(*)  (A:) 

-Vj 


(k)  (k) 

x\  -  y) 


m  n  d  m  n 

£5>lh-»ih  > 


2=1  j  =  l  VU,fc=l2=lj  =  l 

where  the  superscript  (/c)  denotes  the  kth  component  of  a  vector.  Therefore, 

d  m  n 

(*)  ..(*) 


mm 


Fe^l3/)  j=lj=1 


**  -I  U  HI  It 

EE/bll^-yilb  >  p mm  tt^EEE h 


F&n*,y) 

-id  m  n 

>  ^=E  min  YYfn 

Vdfr{F^(x,y) 


xy  -y. 


(k)  (k) 

x\  -  y' 


1  d 

=  -^=  E(min(^,«s)  X  EMD(projefc(z),projCjt(y))) 
1  d 

=  -^min(ms,Ws)  EEMD(Pr°jefc(3;))Proje*(y)) 

Vd  fc=l 

m  n  ^  d 

j“y)  EE/bll^-%ll2  >  ^min(uJE,ui:)EEMD(projet.(3;),projej,(y)). 

Dividing  both  sides  of  the  last  inequality  by  min(w;£,U£)  gives  the  desired  result. 

Note  that  PA  SUM [x.  y )  may  be  rewritten  as 

PASUMfx,  ; ,)  =  Vd(  £LiEMD(proj.t(x),proje,(i,))\ 


V 

This  alternate  expression  makes  it  clear  that  PASUM(z,  y)  is  a  better  lower  bound  than  PAMAX(x,  y) 
iff  the  square  root  of  the  dimension  times  the  average  axis  projection  distance  is  greater  than  the 
maximum  axis  projection  distance. 


5  The  EMD  in  One  Dimension 

Let  x  =  (X,w)  G  D1,m  and  y  =  (Y,  u)  G  D1>n  be  distributions  on  the  real  line.  Assume  the  points 
in  x  and  y  are  sorted  by  position: 

xi  <  x2  <  ■  •  ■  <  xm  and  2/i  <  V2  <■■■  <yn- 
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Define  the  cumulative  distribution  function  (CDF)  of  x  as 


W(t)  =  { 


0 

Eil  *H 

•»E  =  E£l«»i 


if  t  G  (—00, aq) 

if  t  G  [ xk ,  aq+i),  1  <  k  <  m  —  1 
if  t  G  [ajm,  00). 


Similarly,  the  CDF  of  y  is 


U(t)  =  { 


0 

£j=i  ui 
UZ  =  £j=l  uj 


if  t  G  (—00, yi) 

if  t  G  [y/,yi+i)>  1  <  /  <  n  -  1 
if  t  G  [yn,  00)  • 


If  x  and  y  are  equal  weight,  then  the  work  to  transform  one  distribution  into  the  other  is  the  area 
between  the  graphs  of  the  CDFs  of  x  and  y.  See  figure  1.  We  will  now  prove 


Theorem  4  If  x  -  {X,w)  G  D1,m  and  y  =  ( Y,u )  G  D1,n  have  equal  weight  w%  =  us,  then 


EMD(cc,  y) 


Ho  iw(t) -£/(*)!  eft 

WE 


Proof  Let 


<  ^2  <  •  •  •  <  rm+n 


be  the  sorted  list  of  breakpoints  xi,x2,.  ■■  ,xm,yi,y2,---  ,Vn-  Note  that  W(t)  and  U(t)  are  constant 
over  the  interval  t  G  [r^,  r^+i)  for  k  =  1, . . . ,  m  +  n  —  1,  W(t)  =  U(t)  =  0  for  t  G  (—00,  ri),  and 
W(t)  =  U{t)  =  we  =  for  t  G  [rm+n,  00).  Therefore  the  integral  of  the  absolute  difference  of  the 
CDFs  may  be  written  as  the  finite  summation 


/OO 

\W(t)-U(t)  | 

-OO 


dt  = 


m+n—l 

E  {rk+l-rk)\W{rk)-U{rk)\. 


k= 1 


We  claim  that  there  is  exactly  one  feasible  flow  F  that  can  morph  x  into  y.  Consider  the  interval 
(rk,  rk+{).  At  any  position  t  in  this  interval,  the  absolute  difference  \W (t)  -  [/(f) |  is  equal  to 
| W{rk)  -  U{rk)\.  Suppose  that  W(rk)  >  U(rk).  Then  in  any  feasible  flow  from  x  to  y,  exactly 
W(rk)  -  lJ(rk)  weight  from  x  must  be  moved  from  rk  to  rk+].  If  less  than  this  amount  is  moved, 
then  there  will  be  less  x  weight  than  y  weight  in  [rk+ 1, 00)  after  the  flow  is  complete.  If  more  than 
this  amount  is  moved,  then  there  will  be  more  x  weight  than  y  weight  in  [rk+i,  00)  after  the  flow  is 
complete.  Moving  weight  from  rk+ 1  to  rk  would  only  increase  the  surplus  of  x  weight  in  (-oo,rfc]. 
See  figure  2(a).  Similar  logic  shows  that  if  U{rk)  >  W(rk),  then  exactly  U(rk)  -  W(rk)  weight 
from  x  must  be  moved  from  rk+ 1  to  rk.  This  case  is  illustrated  in  figure  2(b).  In  either  case,  the 
amount  of  work  Ek  done  in  moving  weight  from  x  over  the  interval  (rk,rk+ 1)  is 


Ek  =  (rk+1-rk)  \W{rk)  —  U(rk)\. 


The  total  work  E  performed  in  the  unique  feasible  flow  from  x  to  y  is 


ra+n— 1 

E  —  E  E* 

k~  1 

It  follows  that 

EMD(x,y)  =  — , 
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Figure  1:  The  cumulative  distribution  functions  (CDFs)  for  the  equal- weight  line  distributions  x 
and  y  are  W  ( t )  and  repsectively.  The  work  to  transform  x  into  y  is  equal  to  the  area  between 

the  two  CDFs.  The  unique  transforming  flow  is  shown  with  directed  lines  from  x  weight  to  the 
matching  y  weight.  The  EMD  between  x  and  y  is  obtained  by  dividing  the  work  by  the  total  weight 
of  the  distributions  (u?s  =  =  13  in  the  picture). 


Case,  wy  =  uy,  W{rk)  >  U{rk),  w-e  -  W(rk)  <  -  U(rk) 


(b) 


Case,  we  =  ue ,  W(rk)  <  U(rk ),  we  —  W(rk )  >  ue  -  U(rk ) 


W(rk) 

< 

U(rk ) 


I7(rfc) 

J7(rfc) 


we-W  (rk) 
> 

ue  -  U(rk ) 


u>E  - 


ue  ~  U{rk ) 


Figure  2:  The  unique  feasible  flow  between  equal- weight  distributions  x  —  (X,  w )  and  y  =  (Y,  u) 
on  the  line.  Here  r\  <  •■■  <  rm+n  is  the  position-sorted  list  of  points  in  x  and  y,  and  W (t)  and 
U(t)  are  the  CDFs  for  x  and  y,  respectively,  (a)  W(rk)  >  U(rk ),  we  ~~  W(rk )  <  ue  —  U(rk).  In 
this  case,  a  flow  from  x  to  y  is  feasible  only  if  exactly  W (rk)  —  U(rk)  of  x  weight  travels  from  rk  to 
rk+ 1  during  the  flow,  (b)  W(rk)  <  U{rk),  we  -  W{rk)  >  ue  -  U{rk).  In  this  case,  a  flow  from  x 
to  y  is  feasible  only  if  exactly  U(rk)  —  W(rk)  of  x  weight  travels  from  rk  (  \  to  rk  during  the  flow. 
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and  this  completes  the  proof. 


When  the  weights  of  the  distributions  are  unequal,  there  is  no  longer  a  unique  feasible  flow. 
However,  arguments  similar  to  those  used  above  can  be  used  to  compute  a  lower  bound  on  any 
feasible  flow.  Once  again  consider  the  interval  {rk,  r^+i),  and  WLOG  assume  wz  >  uz  and  that 
x  weight  is  moved  to  match  all  the  y  weight.  When  there  is  more  x  weight  than  y  weight  in 
both  (—00,  rk]  and  [r^+i,  oo),  then  there  will  be  feasible  flows  in  which  no  x  weight  travels  through 
('rk,‘i'k+i)-  If  there  is  more  x  weight  than  y  weight  in  (—00,  rk],  but  less  x  weight  than  y  weight  in 
[r^+i,oo),  then  {uz  —  U{rk))  —  {wz  —  W{rk))  of  the  x  weight  must  be  moved  from  rk  to  rk+\  in 
order  to  cover  the  y  weight  in  00).  See  figure  3(a).  If  there  is  less  x  weight  than  y  weight  in 

(— oc,rfc],  but  more  x  weight  than  y  weight  in  [rfc+1,00),  then  U {rk)  —  W{rk)  of  the  x  weight  must 
be  moved  from  rk+i  to  rk  in  order  to  cover  the  y  weight  in  (—00,  rk].  This  case  is  illustrated  in 
figure  3(b).  Under  the  assumption  that  wz  >  uz,  it  cannot  be  the  case  that  there  is  less  x  weight 
than  y  weight  in  both  (-oo,r/c]  and  [rfc+1,00). 

Pseudocode  for  the  lower  bound  described  in  the  previous  paragraph  is  given  below.  The 
routine  is  named  FSBL  because  the  lower  bound  follows  simply  from  flow  feasibility  (FeaSiBiLity) 
conditions. 

function  FSBL(a:,y)  :=  /*  assumes  d  =  1,  wz  >  uz  */ 
work  =  0 
n  =  min(:ri,  ?/i) 
for  k  =  1  to  m  +  n  —  1 

rk+i  =  smallest  point  in  x  or  y  that  is  greater  than  rk 
if  uz  —  U{rk)  >  wz  —  W ( Tk )  then 
work  +=  {{uz  -  U{rk ))  -  {wz  -W{rk)))  x  {rk+l  -  rk) 
elseif  U{rk)  >  W{rk)  then 
work  +=  ( U{rk )  -W{rk))  x  {rk+1  -  rk) 
end  if 
end  for 

return  (work  /  uz) 
end  function 


We  have  argued  that 

Theorem  5  If  x  and  y  are  distributions  on  the  line,  then 

EMD(r, y)  >  FSBL (xty). 

If  wz=uz,  then  {uz  -U{rk)  >  wz~W {rk))  =  {W{rk)  >U {rk)),  {uz  -  U {rk))  -  {wz  -  W {rk))  = 
W{rk )  —  U{rk),  and  the  routine  computes  the  exact  value  EMD(i.  y). 

Theorem  6  If  x  and  y  are  two  equal-weight  distributions  on  the  line,  then 

EMD(x,j/)  =  FSBL(x,y). 

Assuming  that  the  points  in  a;  €  D1,m  and  y  c  D1,n  are  in  sorted  order,  the  routine  runs  in  linear 
time  0(m  +  n).  The  combined  sorted  list  n, . . .  ,rm+n  of  points  in  x  and  y  is  discovered  by  walking 
along  the  two  sorted  lists  of  points.  At  any  time  during  the  algorithm,  there  is  a  pointer  to  the 
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Case.  wy  >  uy,  W{rk)  >  U(rk),  wy  -  W(rk)  <  uy  -  U{rk) 


(b) 


Case,  wy  >  uy,  W(rk)  <  U(rk),  wy  -  W(rk)  >  uy  -  U(rk) 


Figure  3:  Necessary  conditions  for  a  feasible  flow  between  unequal-weight  distributions  x  =  (X,  w) 
and  y  =  {Y,u)  on  the  line,  where  wy  >  uy-  All  y  weight  must  be  covered  by  x  weight,  (a) 
W{rk )  >  U(rk),  wy  —  W(rk )  <  uy  —  U(rk).  In  this  case,  a  necessary  condition  to  have  a  feasible 
flow  from  x  to  y  is  that  at  least  (wy  ~W(rk))  -  (uy  -  U{rk))  of  x  weight  travels  from  rk  to  rk+ 1 
during  the  flow,  (b)  W{rk)  <  U(rk),  wy  -  W{rk )  >  uy  -  U(rk ).  In  this  case,  a  necessary  condition 
to  have  a  feasible  flow  from  x  to  y  is  that  at  least  U(rk )  —  W(rk)  of  x  weight  travels  from  rk+ 1  to 
rk  during  the  flow. 
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next  x  and  next  y  value  to  be  considered.  The  value  r&+ 1  then  follows  in  constant  time  from  the 
value  of  rfc. 

The  FSBL  lower  bound  may  be  substituted  for  the  EMD  function  in  the  PMAX,  PAMAX,  and 
PASUM  lower  bounds  to  obtain  efficient  to  compute,  projection-based  lower  bounds 

PMAXFsbl {V,x,y)  =  max  FSBL(proj,„(a;),  proj„(y)) 

vEv 

=  PMAX(V,a:,y)  when  wy  =  uy 


PAMAXfsbl(^,  y) 


=  FSBL(proje  (*),proje  (y)) 

=  PAMAX(:r,  y )  when  wy  =  uy 


PASUMFSBL(^,y) 


~n  X  FSBL(proj6(t  (x), projej,  (y)) 
Va  k= 1 

PASUM(:r,y)  when  wy  =  uy 


in  which  x  and  y  are  not  necessarily  equal  weight.  The  second  equality  in  each  of  the  three  pairs 
of  equalities  follows  directly  from  Theorem  6  and  the  definitions  of  PMAX(V,  x,  y),  PAMAX(ri;,  y), 
and  PASUM(x,y). 


6  Experiments  in  Color-based  Retrieval 

In  this  section,  we  show  some  results  of  using  the  lower  bounds  CBOX,  PMAXfsbl,  PAMAXfsbl, 
and  PASUMfsbl  in  the  color-based  retrieval  system  described  in  [11].  This  system  summarizes 
an  image  by  a  distribution  of  dominant  colors  in  the  CIE-Lab  color  space,  where  the  weight  of 
a  dominant  color  is  equal  to  the  fraction  of  image  pixels  which  are  classified  as  that  color.  The 
input  to  the  system  is  a  query  and  a  number  K  of  nearest  images  to  return.  The  system  computes 
the  EMD  between  the  query  distribution  and  each  of  the  database  distributions.  If  the  query  is 
a  full  image  (e.g.  an  image  in  the  database),  then  the  query  distribution  and  all  the  database 
distributions  will  have  total  weight  equal  to  one.  In  this  query-by-example  setting,  the  system  first 
checks  the  distance  between  distribution  centroids  before  performing  an  exact  EMD  computation. 
If  the  centroid  distance  is  larger  than  the  ATth  largest  distance  seen  before  the  current  comparison, 
then  the  system  does  not  compute  the  EMD  and  simply  considers  the  next  database  image.  A 
AT-nearest  neighbor  database  image  to  the  query  cannot  be  missed  by  this  algorithm  because  the 
centroid  distance  is  a  lower  bound  on  the  EMD  between  equal-weight  distributions.  When  the 
query  is  a  partial  query  (such  as  “give  me  all  the  images  with  at  least  20%  sky  blue”),  an  exact 
EMD  computation  is  performed  between  the  query  and  every  database  image. 

To  use  the  CBOX  lower  bound  for  partial  queries,  some  additional  preprocessing  is  needed.  At 
database  entry  time,  the  distribution  x  =  (X,  w)  of  an  image  is  computed  and  stored,  as  well  as  the 
centroid  bounding  boxes  Ba(x)  for  a  —  0.05k,  k  —  1, . . . ,  20.  Given  a  query  distribution  y  =  {Y,  u) 
of  weight  uy  <  wy,  let  ay  denote  the  largest  sample  a  value  which  does  not  exceed  the  total  weight 
ratio  uy/wy-  The  system  computes  the  distance  between  y  and  the  nearest  point  in  Bay(x).  This 
is  the  CBOX  lower  bound.  To  use  the  PMAXfsbl  lower  bound,  a  set  V  of  L  (specified  later) 
random  projection  directions  and  the  L  position-sorted  projections  of  each  database  distribution 
along  the  directions  in  V  are  computed  and  stored  at  database  load  time.  At  query  time,  the  query 
distribution  is  also  projected  along  the  directions  in  V.  To  use  the  PAMAXfsbl  and  PASUMfsbl 
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lower  bounds,  the  d  position-sorted  projections  of  each  database  distribution  along  the  space  axes 
are  computed  and  stored  at  database  entry  time.  At  query  time,  the  same  axis  projections  are 
performed  on  the  query  distribution. 

There  are  many  factors  that  affect  the  performance  of  our  lower  bounds.  The  most  obvious  is 
the  database  itself.  Here,  we  use  a  Corel  database  of  20000  color  images  which  is  dominated  by 
outdoor  scenes.  The  order  in  which  the  images  are  compared  to  the  query  is  also  important.  If 
the  most  similar  images  to  a  query  are  processed  first,  then  the  Xth  smallest  distance  seen  will  be 
relatively  small  when  the  dissimilar  images  are  processed,  and  relatively  weak  lower  bounds  can 
prune  these  dissimilar  images.  Of  course,  the  purpose  of  the  query  is  to  discover  the  similar  images. 
Nonetheless,  a  random  order  of  comparison  may  help  ensure  good  performance  over  a  wide  range 
of  queries.  Moreover,  if  a  certain  type  of  query  is  more  likely  than  others,  say,  for  example,  queries 
with  large  amounts  of  blue  and  green  (to  retrieve  outdoor  images  containing  sky  and  grass),  then 
it  would  be  wise  to  pre-determine  a  good  comparison  order  to  use  for  such  queries.  In  the  results 
that  follow,  the  comparison  order  is  the  same  for  all  queries,  and  the  order  is  not  specialized  for 
any  particular  type  of  query. 


The  number  K  of  nearest  images  to  return  is  yet  another  factor.  For  a  fixed  comparison  order 
and  query,  the  number  of  exact  EMD  calculations  pruned  is  inversely  related  to  the  size  of  K.  This 
is  because  the  Kth  smallest  distance  after  comparing  a  certain  number  images,  against  which  a 
lower  bound  is  compared,  is  an  increasing  function  of  K.  In  all  the  upcoming  experiments,  the 
number  of  nearest  images  returned  is  fixed  at  K  =  20.  In  terms  of  the  actual  lower  bounds,  a 
system  may  be  able  to  achieve  better  query  times  by  using  more  than  one  bound.  For  example, 
a  system  might  apply  the  CBOX  lower  bound  first,  followed  by  the  more  expensive  PASUMfsbl 
bound  if  CBOX  fails,  followed  by  an  even  more  expensive  exact  EMD  computation  if  PASUMfsbl 
also  fails.  The  hope  is  that  the  lower  bound  hierarchy  of  CBOX,  PASUMfsbl?  and  EMD  speeds  up 
query  times  in  much  the  same  way  that  the  memory  hierarchy  of  primary  cache,  secondary  cache, 
and  main  memory  speeds  up  memory  accesses.  Our  experiments,  however,  apply  one  lower  bound 
per  query.  For  the  PMAXfsbl  lower  bound,  the  number  L  of  random  directions  must  be  specified. 
This  parameter  trades  off  between  pruning  power  and  computation  speed.  The  more  directions, 
the  greater  the  pruning  power,  but  the  slower  the  computation.  In  our  work,  we  use  the  heuristic 
L  =  2d  (without  quantifiable  justification),  where  d  is  the  dimension  of  the  underlying  point  space 
(so  L  =  6  in  the  color- based  system). 

All  experiments  were  conducted  on  an  SGI  Indigo2  with  a  250  MHz  processor,  and  query 
times  are  reported  in  seconds  (s).  The  exact  EMD  is  computed  via  an  efficient  solution  to  the 
transportation  problem  based  on  the  work  [6].  The  color  signature  of  a  typical  database  image  has 
eight  to  twelve  points.  The  time  for  an  EMD  calculation  between  two  such  images  varies  roughly 
between  half  a  millisecond  and  one  millisecond  (ms).  The  EMD  computation  time  increases  with  the 
number  of  points  in  the  distributions,  so  EMD  computations  involving  a  partial  query  distribution 
with  only  a  few  points  are,  in  general,  faster  than  EMD  computations  between  two  database  images. 
The  time  for  an  EMD  computation  between  a  database  image  and  a  partial  query  with  three  or 
fewer  points  is  typically  about  0.25ms. 


We  begin  our  experiments  with  a  few  very  simple  queries.  Each  of  these  queries  consists  of  a 
distribution  with  exactly  one  color  point  in  CIE-Lab  space.  The  results  of  the  three  queries 
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(a) 


(b) 


Lower  Bound 

#  Pruned 

Query  Time  (s) 

NONE 

0 

2.210 

CBOX 

19675 

0.193 

PMAXFsbl 

19715 

0.718 

PAMAXfsbl 

19622 

0.441 

PASUMfsbl 

18969 

0.536 

Figure  4:  Query  C.1.1  -  20%  blue,  (a)  query  results,  (b)  query  statistics. 


C.1.1  at  least  20%  (sky)  blue 


C.1.2  at  least  40%  green 


C.1.3  at  least  60%  red 


and 


are  shown  in  figure  4,  figure  5,  and  figure  6,  respectively.  In  these  examples,  all  the  lower  bounds 
result  in  query  times  which  are  less  than  the  brute  force  query  time,  and  avoid  a  large  fraction 
of  exact  EMD  computations.  The  CBOX  and  PASUMFSbl  bounds  gave  the  best  results  on  these 
three  queries. 


The  next  set  of  examples  consists  of  randomly  generated  partial  queries.  The  results  for  the 
five  queries 
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Lower  Bound 

#  Pruned 

Query  Time  (s) 

NONE 

0 

3.043 

CBOX 

19634 

0.233 

PMAXfsbl 

10172 

2.552 

PAMAXfsbl 

16222 

1.124 

PASUMfsbl 

18424 

0.754 

Figure  5:  Query  C.1.2  -  40%  green,  (a)  query  results,  (b)  query  statistics. 


Lower  Bound 

#  Pruned 

Query  Time  (s) 

NONE 

0 

2.920 

CBOX 

19621 

0.240 

PMAXfsbl 

15903 

1.505 

PAMAXfsbl 

17125 

0.871 

PASUMfsbl 

18182 

0.785 

Figure  6:  Query  C.1.3  -  60%  red.  (a)  query  results,  (b)  query  statistics. 


Lower  Bound 

#  Pruned 

Query  Time  (s) 

NONE 

0 

4.240 

CBOX 

18704 

0.496 

PMAXFsbl 

17989 

1.323 

PAMAXfsbl 

17784 

1.035 

PASUMfsbl 

18418 

0.832 

Figure  7:  Query  C.2.1  -  13.5%  green,  3.4%red,  17.8%  yellow,  (a)  query  results,  (b)  query  statistics. 


C.2.1  13.5%  green,  3.4%red,  17.8%  yellow 

C.2.2  26.0%  blue,  19.7%  violet 

C.2.3  16.8%  blue,  22.2%  green,  1.8%  yellow 

C.2.4  22.8%  red,  24.2%  green,  17.3%  blue 


C.2.5  13.2%  yellow,  15.3%  violet,  15.3%  green 


and 


are  shown  in  figure  7  through  figure  11,  respectively.  The  CBOX  lower  bound  gives  the  best  results 
for  queries  C.2.1  and  C.2.2,  but  its  performance  drops  by  an  order  of  magnitude  for  C.2.3,  and  it  is 
completely  ineffective  for  C.2.4  and  C.2.5.  Indeed,  the  CBOX  lower  bound  pruned  only  1  of  20000 
database  images  for  query  C.2.5.  The  CBOX  behavior  can  be  explained  in  part  by  the  locations 
of  centroids  of  the  query  distributions  and  the  database  distributions.  See  figure  12.  Roughly 
speaking,  the  effectiveness  of  the  CBOX  bound  is  directly  related  to  the  amount  of  separation 
between  the  database  distributions  and  the  query  distribution,  with  larger  separation  implying 
a  more  effective  bound.  The  query  C.2.1  consists  almost  entirely  of  green  and  yellow.  As  one 
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Lower  Bound 

#  Pruned 

Query  Time  (s) 

NONE 

0 

3.812 

CBOX 

18631 

0.453 

PMAXpsBL 

16472 

1.452 

PAMAXfsbl 

17032 

1.010 

PASUMfsbl 

17465 

1.037 

Figure  8:  Query  C.2.2  -  26.0%  blue,  19.7%  violet,  (a)  query  results,  (b)  query  statistics. 


(a) 


(b) 


Lower  Bound 

#  Pruned 

Query  Time  (s) 

NONE 

0 

4.073 

CBOX 

1631 

3.999 

PMAXfsbl 

10550 

3.235 

PAMAXfsbl 

11690 

2.648 

PASUMfsbl 

15386 

1.612 

Figure  9:  Query  C.2.3  -  16.8%  blue,  22.2%  green,  1.8%  yellow,  (a)  query  results,  (b)  query 
statistics. 
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Lower  Bound 

#  Pruned 

Query  Time  (s) 

NONE 

0 

3.969 

CBOX 

26 

4.158 

PMAXfsbl 

3606 

4.342 

PAMAXfsbl 

3399 

4.010 

PASUMfsbl 

12922 

2.324 

Figure  10:  Query  C.2.4  -  22.8%  red,  24.2%  green,  17.3%  blue,  (a)  query  results,  (b)  query  statistics. 


(a) 


(b) 


^|Jj| 

gg| 

1 

i 

...  -  If 

Lower  Bound 

#  Pruned 

Query  Time  (s) 

NONE 

0 

3.375 

CBOX 

1 

3.560 

P  MAXfsbl 

9608 

2.924 

PAMAXfsbl 

10716 

2.381 

PASUMfsbl 

15562 

1.492 

Figure  11:  Query  C.2.5  -  13.2%  yellow,  15.3%  violet,  15.3%  green,  (a)  query  results,  (b)  query 
statistics. 
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can  see  from  figure  12,  the  centroid  of  C.2.1  is  very  isolated  from  the  database  centroids.  The 
approximately  equal  amounts  red,  green,  and  blue  in  query  C.2.4  result  in  a  centroid  which  is  close 
to  a  large  number  of  database  centroids.  The  same  statement  holds  for  query  C.2.5  which  has  green 
and  yellow  in  one  corner  of  the  CIE-Lab  space,  and  violet  at  the  opposite  corner. 


The  distances  of  the  centroids  for  C.2.2  and  C.2.3  to  the  database  centroids  are  (i)  about  the 
same,  and  (ii)  are  smaller  than  the  distance  for  C.2.1  and  larger  than  the  distances  for  C.2.4  and 
C.2.5.  Observation  (ii)  helps  explain  why  the  performance  of  CBOX  on  C.2.2  and  C.2.3  is  worse 
than  the  performance  on  C.2.1,  but  better  than  the  performance  on  C.2.4  and  C.2.5.  Observation  (i) 
might  lead  one  to  believe  that  the  CBOX  performance  should  be  about  the  same  on  C.2.2  and  C.2.3. 
The  statistics,  however,  show  that  this  is  not  the  case.  To  understand  why,  we  must  remember  that 
the  queries  are  partial  queries.  The  relevant  quantity  is  not  the  centroid  of  a  database  distribution, 
but  rather  the  locus  of  the  centroid  of  all  sub-distributions  with  weight  equal  to  the  weight  of  the 
query.  Consider  images  with  significant  amounts  of  blue  and  green,  and  other  colors  which  are 
distant  from  blue  and  green  (such  as  red).  The  other  colors  will  help  move  the  distribution  centroid 
away  from  blue  and  green.  However,  a  sub-distribution  of  such  an  image  which  contains  only  blue 
and  green  components,  will  have  a  centroid  which  is  close  to  blue  and  green,  and  hence  close  to 
the  centroid  of  C.2.3.  The  distance  between  the  query  centroid  and  this  image  centroid  may  be 
large,  but  the  CBOX  lower  bound  will  be  small  (and,  hence,  weak).  Prom  figure  12  and  the  results 
of  C.2.2  and  C.2.3,  one  can  infer  that  there  are  many  more  images  that  contain  blue,  green,  and 
significant  amounts  of  distant  colors  from  blue  and  green  than  there  are  images  that  contain  blue, 
violet,  and  significant  amounts  of  distant  colors  from  blue  and  violet.  The  centroid  is  a  measure  of 
the  (weighted)  average  color  in  a  distribution,  and  the  average  is  not  an  accurate  representative  of 
a  distribution  with  high  variance  (i.e.  with  colors  that  span  a  large  portion  of  the  color  space). 


The  projection-based  lower  bounds  PMAXfsbl,  PAMAXfsbl,  PASUMfsbl  compare  two  dis¬ 
tributions  by  comparing  the  distributions  projected  along  some  set  of  directions.  The  PMAXfsbl, 
PAMAXfsbl,  and  PASUMfsbl  lower  bounds  make  stronger  use  of  a  distribution  than  simply  re¬ 
ducing  it  to  its  average  point,  so  there  is  hope  that  the  these  bounds  will  help  when  the  CBOX 
bound  is  ineffective.  In  queries  C.2.3,  C.2.4,  and  C.2.5,  the  projection-based  lower  bounds  prune 
far  more  EMD  calculations  than  the  CBOX  bound.  However,  pruning  a  large  number  of  EMD 
calculations  does  not  guarantee  a  smaller  query  time  than  achievable  by  brute  force  because  of  the 
overhead  of  computing  a  lower  bound  when  it  fails  to  prune  an  EMD  calculation.  In  all  the  ran¬ 
dom  partial  queries  C.2.*,  the  query  times  for  PMAXfsbl,  PAMAXfsbl,  and  PASUMfsbl  were 
less  than  the  query  times  for  brute  force  processing,  except  for  the  PMAXfsbl  and  PAMAXfsbl 
bounds  in  query  C.2.4.  In  particular,  the  PASUMfsbl  bound  performed  very  well  for  all  the 
queries.  Since  the  projection-based  lower  bounds  are  more  expensive  to  compute  than  the  CBOX 
lower  bound,  they  must  prune  more  exact  EMD  calculations  than  CBOX  in  order  to  be  as  effective 
in  query  time. 


The  queries  in  the  final  two  examples  of  this  section  are  both  images  in  the  Corel  database. 
The  results  of  the  queries 
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Lower  Bound 

#  Pruned 

Query  Time  (s) 

NONE 

o 

15.768 

CBOX 

19622 

0.535 

PMAXfsbl 

19635 

1.522 

PAMAXfsbl 

19548 

1.062 

PASUMfsbl 

18601 

1.847 

Figure  13:  Query  C.3.1  -  sunset  image,  (a)  query  results,  (b)  query  statistics. 


C.3.1 


C.3.2 


and 


are  shown  in  figure  13  and  figure  14,  respectively.  The  distributions  for  queries  C.3.1  and  C.3.2 
contain  12  and  13  points,  respectively.  Notice  that  the  brute  force  query  time  for  the  C.3.*  queries  is 
much  greater  than  the  brute  force  query  time  for  the  C.I.*  and  C.2.*  queries.  The  difference  is  that 
both  the  query  and  the  database  images  have  a  “large”  number  of  points  for  the  C.3.*  queries.  All 
the  lower  bounds  perform  well  for  query  C.3.1,  but  the  CBOX  lower  bound  gives  the  lowest  query 
time.  Recall  that  the  CBOX  lower  bound  reduces  to  the  distance  between  distribution  centroids 
for  equal-weight  distributions.  The  centroid  distance  pruned  many  exact  EMD  calculations  for 
C.3.1  because  most  of  the  weight  in  the  distribution  is  around  yellow  and  orange,  far  from  the 
centroids  of  the  database  images  (as  one  can  see  in  figure  12).  The  blue,  green,  and  brown  in 
query  C.3.2  span  a  larger  part  of  the  color  space  than  the  colors  in  C.3.1,  the  query  centroid  is 
close  to  many  database  centroids  (once  again,  see  figure  12),  and  the  centroid  distance  lower  bound 
does  not  perform  as  well  as  for  C.3.1.  The  projection-based  lower  bounds,  however,  each  give  a 
better  query  time  for  query  C.3.2  than  the  centroid-distance  bound.  Recall  that  the  lower  bounds 
PMAXfsbl,  PAMAXfsbl,  and  PASUMfsbl  reduce  to  the  stronger  lower  bounds  PMAX,  PAMAX, 
and  PASUM  for  equal-weight  distributions.  The  PASUMfsbl  lower  bound  yields  a  tolerable  query 
time  for  query  C.3.2. 
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(a) 


(b) 


S  a _ = . 5i . . .  = _ _ 2 

f«j  Query  Result 

1. . . . 

Wm 

Lower  Bound 

#  Pruned 

Query  Time  (s) 

NONE 

0 

14.742 

CBOX 

9571 

8.106 

PMAXfsbl 

15094 

5.893 

PAMAXfsbl 

13461 

6.741 

PASUMfsbl 

17165 

3.343 

Figure  14:  Query  C.3.2  -  image  with  trees,  grass,  water,  and  sky.  (a)  query  results,  (b)  query 
statistics. 


7  The  EMD  under  Translation 


Given  a  distribution  y  =  (Y,  u )  €  Dd,n,  let  y  ©  t  e  Dd,n  denote  the  translation  of  y  by  t  €  Rrf: 

V  ©*  =  {  (yi  +t,ui),  (y2  +  t,u2),... ,  (yn  +  t,un)  }. 

The  EMD  under  translation  EMD-7-(:e,  y)  is  defined  as 


EMD7-(a;,y)  =  min  EMD(x,j/®i). 
te  Rd 


If 

m  n 

hd(F,  t )  =  WORK(F,  x,  y  ©  t)  =  £  E  Vj  +  *), 

*=1  j  =  l 


then 


EMDj-(a:,  y) 


minteRd,F€*-(»,y)  hd(F,f) 


(15) 


Note  that  EMD^(a:,  y)  is  invariant  under  translation  of  x  or  y  if  d{x^  Vj+t)  ~  d(x{  —  y3).  Here  we 
have  added  the  superscript  d  to  EMD^  to  show  the  dependence  on  the  ground  distance  function. 
We  have  also  used  the  fact  that  F(x,y)  =  F(x,y  ©  t),  which  follows  directly  from  the  fact  that 
the  weights  of  y  ©  t  are  the  same  as  the  weights  of  y.  Clearly,  it  suffices  to  minimize  hd(F,  t)  to 
compute  the  EMD  under  translation.  In  section  7.1,  we  give  a  direct,  but  inefficient,  algorithm  to 
compute  the  global  minimum  of  hd(F,  t)  over  the  region 


R{x,y)  =  {(F,t)  :  F  E  F(x,  y),  t  eRd  }  =  F{x,y)  xHd. 


In  section  7.2,  we  give  an  efficient  iterative  algorithm  that  always  converges  monotonically,  although 
not  necessarily  to  the  global  minimum.  Nonetheless,  it  may  find  the  global  minimum  if  the  iteration 
is  run  with  a  few  different  initial  translations. 
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Both  the  direct  and  iterative  algorithms  require  a  solution  to  the  following  minimization  prob¬ 
lem:  for  F  =  ( fij )  €  F(x,y)  fixed,  compute 


min  hd{F,  t)  =  min  WORK(E,  x,  y  ®  t).  (16) 

te  Rd  t€Rd 


If 


d{xi ,  yj  +  t)  =  d(xi  -  yj,  t), 


(17) 


then  (16)  can  be  written  as 


min 

f£Rrf 


m  n 

Y.  ~  VV  *)• 

i=l j- 1 


Note  that  condition  (17)  holds  for  any  Lp  distance  function  d.  If  we  let  Zij  =  xt  —  yj  and  we  convert 
the  two-dimensional  index  ij  into  a  one-dimensional  index  l  to  obtain  fi  and  zi,  then 


m  n  mn 

fijd{xuVj  +  t)  =  ^fid{zht), 


i=  1  j- 1 


1= 1 


and  the  minimization  problem 


m  n 

min  V  V  fijd{xi  -  yj:t ) 


min 

tend 


mn 

Y^fld(zut) 


Z=1 


(18) 


asks  for  a  point  t  which  minimizes  a  sum  of  weighted  distances  to  a  given  set  of  points.  This  minisum 
problem  is  the  subject  of  section  8,  where  we  show  how  to  solve  the  problem  when  the  distance 
function  d  is  the  L2-distance  squared  (section  8.1),  the  Li-distance  (section  8.2),  and  the  ^-distance 

jji  j 

(section  8.3).  The  solutions  to  these  three  problems  allow  us  to  compute  EMDr2,  EMDrx,  and 
EMD^-2,  respectively.  It  should  be  noted,  however,  that  even  for  equal- weight  distributions,  using 
the  /.^-distance  squared  for  the  ground  distance  means  that  the  EMD  is  no  longer  a  metric.  One 
reason  to  consider  the  I^-distance  squared  is  that  there  is  a  simple  closed  form  solution  for  the 
optimal  translation  if  the  distributions  are  equal  weight  (see  section  8.1). 


7.1  A  Direct  Algorithm 

The  function  hd(F,t)  is  linear  in  F.  It  follows  that  for  t  fixed,  the  minimum  value 


min 

FeF(x,y) 


hd(F,t ) 


is  achieved  at  one  of  the  vertices  (dependent  on  t)  of  the  convex  polytope  T(x,y).  If  we  let 


V(x,y )  =  {  Vi,  ...,  vN  } 


denote  the  finite  set  of  vertices  of  F(x,y),  then 

min  hd(F,t)  —  hd(F*(t),t)  for  some  vertex  F* (t)  €  V(x,  y) , 
FeF(x,y) 


and 

min  hd(F,t)  -  min  hd(F*(t),t).  (19) 

(F,t)eR(x,y)  te  Rd 
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The  minimum  on  the  right-hand  side  of  (19)  can  be  rewritten  as 


min  hd(F*(t),  t)  —  min  min  hd(F,t), 
ten*  Fev(x,y)  ten* 


so  that 


min  hd(F.t)  =  min  min  hd(F,  t). 

(F,t)eR(x,y)  FeV(x,y)  teKd 


(20) 


Thus,  if  the  innermost  minimum  on  the  right-hand  side  of  (20)  exists,  then  the  minimum  on  the 
left-hand  side  of  (20)  must  also  exist  and  must  be  achieved  at  some  (F*,t*),  where  F*  £  V(x,y). 
Given  an  algorithm  to  compute 

min  hd(F,t) 

t€  Rd 

for  a  fixed  F,  the  minimum  on  the  left-hand  side  of  (20)  may  be  computed  by  simply  looping  over 
all  the  vertices  in  V(x,y): 

min  hd(F,  t)  —  min  min  hd(vhA).  (21) 

(F,t)£R(x9y)  k=i,...,N  tend  '  v  ' 


Only  a  finite  number  of  flow  values  must  be  examined  to  find  the  minimum  work. 

Although  this  simple  strategy  guarantees  that  we  find  a  globally  optimal  translation,  it  is  not 
practical  because  N  can  be  very  large.  We  may  eliminate  the  variable  fu  in  the  definition  of  a 
feasible  flow  by  solving  (4)  for  fu  as  an  affine  combination  of  the  other  /^*’s.  Substituting  for  fu 
in  (1),  (2),  and  (3)  leaves  mn  +  m  +  n  linear  inequalities.  This  reasoning  shows  that  F(x,  y)  is  an 
(ran  —  1) -dimensional  convex  polytope  defined  by  the  intersection  of  mn  +  m  +  n  halfspaces.  The 
Upper  Bound  Theorem  ([13], [3])  states  that  a  simple  polytope  in  Hd  with  n  facets  has  0(nLd/2 J) 
vertices,  and  there  are  examples  for  which  this  bound  is  tight.  Therefore,  y)  can  have  as  many 
as  N  —  0((mn  —  1  )mn+m+n)  vertices.  Even  for  small  values  of  m  and  n,  this  is  too  many  vertices  to 
exhaustively  check  in  a  reasonable  amount  of  time.  The  beauty  of  the  simplex  algorithm  ([10])  for 
solving  a  linear  program  is  that  it  provides  a  method  for  visiting  vertices  of  the  feasible  polytope 
in  such  a  way  that  the  objective  function  always  gets  closer  to  its  optimal  value  (and  the  number 
of  vertices  visited  is  always  no  larger  in  order  than  the  maximum  of  the  number  of  variables  and 
the  number  of  constraints).  In  the  next  section,  we  give  an  iterative  algorithm  that  generates  a 
sequence  of  (flow, translation)  pairs  for  which  the  amount  of  work  decreases  or  remains  constant  at 
every  step. 


7.2  An  Iterative  Algorithm 

Consider  the  following  iteration  that  begins  with  an  initial  translation  t 


F(k)  =  (/«) 


r  in  n  \ 

“e  I  £  E  Svi  (*,  Vi  +  ((i|)  ) , 


i=  1  j= 1 


ar§  (  Vj  +  t)  )  . 

i=lj=l 


(22) 

(23) 


The  minimization  problem  on  the  right-hand  side  of  (22)  is  the  familar  transportation  problem. 
Under  the  assumption  (17),  the  minimization  problem  on  the  right-hand  side  of  (23)  is  the  minisnm 
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problem  (18)  to  be  covered  in  section  8.  The  flow  and  translation  iterates  define  the  work  and  EMD 
iterates 

m  n 

WORK®  =  EE  4?’  d(xi,yj  +  tW)  =  WORK  (F®,:r,y©f®), 

i=l j=l 
WORK® 

EMD®  =  - ~r. 

min(ws^s) 

The  order  of  evaluation  is 

j(o)  F(0)  *(i)  . . . 

WORK<°\  EMD<°>  WORK*1',  EMD^ 

By  (22),  we  have 

m  n  mn 

WORK(*+1)  =  E  E  fii+1>d  (*<• « + i“’+1))  s  E  E  ffd  (*<• » + *'*+1))  ■  <24> 

j=l  j=l  j=l  j=l 

From  (23),  we  know 

m  n  m  n 

EE  <EEf ‘'(■»»+<(‘l) =W0RKl‘l-  <25> 

z-1  j=l  i=lj=l 

Combining  (24)  and  (25)  shows 

WORK(fc+1)  <  WORK®.  (26) 


The  decreasing  sequence  (WORK®)  is  bounded  below  by  zero,  and  hence  it  converges  ([7]).  There 
is,  however,  no  guarantee  that  the  work  iteration  converges  to  the  global  minimum  of  hd(F,t)  = 
WORK  (F,x,y@t). 

One  way  for  the  work  iteration  to  converge  is  if  F ®  is  returned  in  step  (22)  as  an  optimal  flow 
for  t®,  and  #+1)  =  i®  is  returned  in  step  (23)  as  an  optimal  translation  for  F®.  Denote  the 
indicator  function  for  this  event  as  MUTUAL  (F®,t®).  It  is  clear  that 


MUTUAL  (F®,t®) 


t(k)  _  t(k+ 1) 

p ®  =  _p(fc+!) 

WORK®  =  WORK(fc+1> 

V 


The  fact  that  F®  is  an  optimal  flow  for  i®  implies 


and 


(27) 


where  a  neighborhood  of  F  E  d(F(x,  y))  must  be  restricted  to  lie  within  F{x,  y).  The  fact  that  i® 
is  an  optimal  translation  for  F®  implies 


((*))=  0. 


(28) 
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Combining  conditions  (27)  and  (28)  shows  that  the  work  iteration  converges  to  either  a  local 
minimum  or  a  saddle  point  value  if  MUTUAL  is  true. 

Now  suppose  that  the  routine  that  solves  the  linear  program  (LP)  in  (22)  always  returns  a 
vertex  of  F(x:y).  The  simplex  algorithm,  for  example,  always  returns  a  vertex  of  the  feasible 
polytope.  This  is  possible  since  there  is  always  a  vertex  of  the  feasible  polytope  at  which  a  linear 
objective  function  achieves  its  minimum.  With  the  assumption  that  the  flow  iterates  are  always 
vertices  of  T(x,y),  there  will  be  only  a  finite  number  of  points  (F,£)  that  the  work  iteration  visits 
because  there  are  a  finite  number  of  flow  iterates,  and  each  translation  iterate  (other  than  the  initial 
translation)  must  be  an  optimal  translation  returned  for  one  of  the  flow  iterates.  It  follows  that 
there  are  only  a  finite  number  of  work  values  generated.  Since  the  work  iteration  is  guaranteed  to 
converge,  the  work  iterates  must  stabilize  at  one  of  these  work  values.  Suppose 

WORK^  -  WORK(fc+1)  =  •  •  •  .  (29) 


Since  there  are  only  a  finite  number  of  pairs  (F,  t)  visited,  condition  (29)  implies  that  there  must 
be  a  repeating  cycle  of  pairs: 


,  ...  ^F^-k+r~1\t^k+r~1^  ,  (^F^k+r\t('k+r^  =  ^F^k\t^k^ 


For  r  >  1,  the  work  iteration  converges  even  though  the  flow  and  translation  iterations  do  not 
converge.  However,  such  a  non-trivial  (flow, translation)  cycle  is  unstable  in  the  sense  that  it  can  be 
broken  (for  any  real  problem  data)  by  perturbing  one  of  the  translation  iterates  by  a  small  amount. 
In  practice,  the  work  iteration  almost  always  converges  because  a  length  r  =  1  cycle  occurs.  A 
cycle  of  length  r  =  1  starting  at  (^F^k\t^k^  is  exactly  the  condition  MUTUAL  and  we 

previously  argued  that  the  work  iteration  converges  to  a  critical  value  in  this  case. 

Finally,  let  us  show  that  the  work  sequence  will  stabilize  at  the  global  minimum  once  F^  =  F* , 
where  ( F *,  t*)  is  optimal  for  some  t*.  First,  it  is  easy  to  see  that  if  ( F*,t *)  =  (V^,  is  optimal, 

then  hd(F*,t*)  =  WORK^  =  WORK(fe+1)  =  •••  .  This  is  an  immediate  consequence  of  the 
optimality  of  and  the  monotonicity  condition  (26).  Now  suppose  F ^  =  F*,  where  (F*,t*) 

is  optimal.  Note  that  and  t*  both  solve  (23),  so 


hd  =  hd  (f(/c),#+1))  =  hd  =  hd(F*,t*). 


(If  (23)  has  a  unique  solution,  then  =  f*.)  Since  condition  (24)  gives 


hd  ^(*+i)jt(*+i)J  <  hd  fp(k)^t{k+ 1)^  =  hd(F\f), 


and  since 

hd  (F(fc+i)j4(*+i))  >  hd(F*,t*)  (optimality  of  {F*,t*)), 

we  must  have 

WORK(fe+1)  =  hd  (F(fc+1  ),#+1))  =  hd(F*,t*). 

(If  (22)  has  a  unique  solution,  then  F'-k  1  1  ’  =  F*.)  We  have  already  argued  that  once  the  work 
sequence  hits  the  minimum,  it  must  repeat  at  this  minimum  forever. 

Let  us  summarize  the  results  of  this  section.  The  work  iteration  always  converges.  We  can 
arrange  to  have  all  flow  iterates  at  the  vertices  of  F(x,y).  In  this  case,  the  (flow, translation) 
iterates  must  cycle.  A  cycle  of  length  r  >  1  will  almost  never  occur,  and  a  cycle  of  length  r  =  1 
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implies  that  the  (flow, translation)  sequence  converges  to  a  critical  point  and,  therefore,  that  the 
work  sequence  converges  to  either  a  local  minimum  or  a  saddle  point  value.  Thus,  in  practice  the 
work  iteration  almost  always  converges  to  a  critical  value.  If  the  flow  iteration  ever  reaches  a  vertex 
at  which  the  minimum  work  occurs  with  a  suitable  choice  of  translation,  then  the  work  iteration 
converges  to  the  global  minimum.  Since  there  is  no  guarantee  that  the  work  iteration  converges 
to  the  global  minimum,  the  iterations  should  be  run  with  a  few  different  starting  translations 
in  search  of  the  true  minimum  work.  In  some  preliminary  experiments,  we  have  found  that  the 
work  iteration  usually  converges  within  a  handful  of  iterations  (three  to  five)  using  d  equal  to  the 
I/2-distance  squared,  the  Li-distance,  or  the  Z/2-distance. 


8  Minimizing  a  Weighted  Sum  of  Distances 

The  abstract  minimization  problem  considered  in  this  section  is 

n 

min  V ]wid(p,pi). 

V  f  * 

i= 1 

We  now  show  how  to  solve  this  problem  when  d  is  the  L2-distance  squared,  the  ii-distance,  and 
the  L2-distance. 


8.1  Minimizing  a  Weighted  Sum  of  Squared  L2  Distances 

If  d  is  the  L2-distanee  squared,  then  the  minisum  problem  is  a  weighted  sum  of  squares  problem 

n 

min  yVlb-pdll. 

P  , 

1=1 

It  is  well-known  (and  easily  proven  using  standard  calculus)  that  the  unique  optimal  location  p*  is 
at  the  centroid 

Returning  the  original  problem  (18)  for  a  moment,  we  have 

Emn  r  „ 

Z=1  Jlzl 


t*  =Z  = 


z  = 


Emn  /■ 

1=1  Jl 

ggzl  E"=i  fijfa  -  Vj) 

Em  n 

i=l  2^j= 1  Jij 

E£xE?=i  fiiixj-yj) 


min(iys,«£) 

When  x  and  y  are  equal-weight  distributions, 


Z  = 


t*  =  z  =  x  —  y. 


wy,  ut. 

YZUwiXj  E"=1  u3y3 

Us 


((6),  (7)) 


In  the  equal  weight  case,  the  best  translation  for  any  feasible  flow  F  =  {f\j)  is  x  —  y.  The  iteration 

L2 

given  in  section  7.2  is  not  needed  in  this  case  to  compute  EMD r2(x,y).  Instead,  simply  translate 
y  by  x  -  y  (this  lines  up  the  centroids  of  x  and  y)  and  compute  EMDL2(x,  y  ©  (x  -  y)). 
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8.2  Minimizing  a  Weighted  Sum  of  L\  Distances 

In  this  section,  we  consider  the  minisum  problem  when  d  is  the  Li-distance.  The  minimization 
problem  is 


n 

mjn  £W*IIP-Pilll  = 

P  1=1 

n 

min  Sw<||p-p<||x  = 

p  i=  1 

where  p-fci  and  p[k>  are  the  kth  components  of  p  and  p;,  respectively.  Thus,  a  solution  to  the 
problem  in  one  dimension  gives  a  solution  to  the  problem  in  d  dimensions  by  simply  collecting  the 
optimal  location  for  each  of  the  one-dimensional  problems  into  a  d-dimensional  vector. 

Now  suppose  Pi  <  P2  <•■•<  Pn  are  points  along  the  real  line,  and  we  want  to  minimize 

n 

9(P)  =  YsWi\P~Pi  I’ 

i= 1 

Let  po  =  —  oo  and  pn+i  =  +oc.  Then 


mm 
p 

i—  1  k—  1 

d  /  n 

mjn 

k—l  \i= 1 
d  /  n 

L  hi? 

k=  1  \p  i= 1 


ipw  -  p!*1 


I  n 

9(p)  =  Ylwi(p-pp  +  EE  wi(Pi~P)  for  p  G  Z  =  0,...,n. 

i=i  «=J+i 

Over  the  interval  [p/,p/+1],  p(p)  is  affine  in  p: 


d(p)  =  I  EE  ~  EE  P+  EE  WiPi  ~  E*E  ^Pi )  for  p  G  [poPi+i]. 


ii=  1  z=/-fl 


z=l 


If  we  let 

l  n 

mi  =  J2wi~  EE  wi 

i~  1  i=/+l 

denote  the  slope  of  p(p)  over  [p/,p/+x],  then 


(30) 


-ms  =  mo  <  mi  <  •  •  •  <  mn  =  we, 


and 


mi+i  =  mi  +  2wp 


The  function  g(p )  is  a  continuous  piecewise  linear  function  with  slope  increasing  from  a  negative 
value  at  —  oo  to  a  positive  value  at  +00,  and  as  such  it  obviously  has  a  minimum  value  at  the  point 
when  its  slope  first  becomes  nonnegative.  Let 


l*  =  min  {l  :  m;  >  0  }. 

If  mi .  ±  0,  then  then  the  unique  minimum  value  of  g(p)  occurs  at  pi*.  Otherwise,  mi*  —  0  and 
the  minimum  value  of  g(p)  is  achieved  for  p  E  [pp,pj»+i].  See  figure  15.  In  the  special  case  of 
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equal-weight  points,  the  minimum  value  occurs  at  the  ordinary  median  value  of  the  points.  If 
Wi  =  w,  then  it  follows  easily  from  (30)  that  m/  =  w(2l  —  n).  If  n  is  odd,  then  l*  --  \n/ 2],  mi .  >  0, 
and  the  unique  minimum  of  g(p)  occurs  at  the  median  point  p\np>  ■  If  n  is  even,  then  /*  =  n/2, 
rri[-  =  0,  and  the  minimum  value  of  g(p)  is  attained  for  every  point  in  the  interval  [pn/2»P(n/2)+i]- 
See  figure  16. 


8.3  Minimizing  a  Weighted  Sum  of  L2  Distances 

The  final  minisum  problem  that  we  consider  is  when  d  is  the  -/^-distance  function.  The  minimization 
problem 

n 

min  J^WiWp  -  Pi\\2  (31) 

P  i-l 

has  a  long  history  ([15]).  A  basic  iteration  procedure  that  solves  this  problem  was  proposed  in  1937 
by  Weiszfeld  ([14]).  Consider  the  objective  function 

n 

ff(p)  =  S^lb-Pilb- 
*=1 


If  the  points  pi, . . .  ,pn  are  not  collinear,  then  g(p)  is  strictly  convex  and  has  a  unique  minimum.  If 
Pu---iPn  are  collinear,  then  an  optimal  point  must  lie  on  the  line  through  the  given  points  (if  not, 
one  could  project  the  claimed  optimal  point  onto  to  the  line,  thereby  decreasing  its  distance  to  all 
the  given  points,  to  obtain  a  better  point).  In  this  case,  the  algorithm  given  in  section  8.2  for  points 
on  the  real  line  can  be  used  (the  ^-distance  reduces  to  the  absolute  value  in  one-dimension).  The 
objective  function  is  differentiable  everywhere  except  at  the  given  points: 

dg  =  A  wt{p-pt) 

dp  f^Wp-Pih' 

Setting  the  partial  derivative  to  zero  results  in  the  equation 


A  wj(p-pj) 

fellP-ftlla 


which  cannot  be  solved  explicitly  for  p.  The  Weiszfeld  iteration  replaces  the  p  in  the  numerator 
by  the  ( k  -I-  l)st  iterate  p(fc+1)  and  the  p  in  the  denominator  by  the  Arth  iterate  p^k\  and  solves  for 

p(k+i). 


p(*+i)  = 


XLi 

Pi 


if P{k)  ^PU-'-iPn 

if  p(k)  =  Pi 


Here  are  some  facts  about  this  iteration  (assuming  the  input  points  are  not  collinear). 


•  The  iteration  always  converges.  ([9]) 

•  If  no  iterate  p ^  is  equal  one  of  the  given  points,  then  the  iteration  converges  to  the  global 
minimum  location  of  g(p).  ([9]) 

•  The  iteration  can  fail  to  converge  to  the  global  minimum  location  for  a  continuum  of  starting 
values  because  some  iterate  p ^  becomes  equal  to  a  non-optimal  given  point.  ([2]) 

•  If  the  optimal  location  is  not  at  one  of  the  given  points,  then  convergence  will  be  linear.  ([8]) 
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•  If  the  optimal  location  is  at  one  of  the  given  points,  then  convergence  can  be  linear,  super- 
linear,  or  sublinear.  ([8]) 

Since  convergence  to  the  global  minimum  location  is  not  guaranteed,  the  iteration  should  be  run 
more  than  once  with  different  starting  points. 

It  is  conjectured  in  [2]  that  if  the  starting  point  is  within  the  affine  subspace  P  spanned  by  the 
given  points,  then  the  Weiszfeld  iteration  is  guaranteed  to  converge  to  the  global  minimum  location 
for  all  but  a  finite  number  of  such  starting  points.  If  this  conjecture  is  true,  then  the  iteration  will 
converge  with  high  probability  to  the  optimal  location  if  one  chooses  a  random  starting  point  in 
P.  Note  that  P  is  the  entire  space  Rd  if  the  n  -  1  vectors  pn  —  p\  ,pn  -  p2, . . . , pn  -  pn-i  span  all 
of  Rd.  If  the  given  points  are  random,  this  event  is  very  likely  to  occur  if  n  -  1  >  d.  In  regards  to 
speeding  up  convergence,  see  [5]  for  an  accelerated  Weiszfeld  procedure. 


9  Conclusion 

We  have  presented  several  lower  bounds  on  the  EMD  which  do  not  require  equal-weight  distribu¬ 
tions,  and  are  therefore  applicable  to  partial  queries.  The  effectiveness  of  the  bounds  was  illustrated 
in  a  color-based  retrieval  system  where  applying  one  bound  per  query  almost  always  resulted  in  a 
smaller  query  time  than  brute  force  query  processing.  Using  a  combination  of  bounds  per  query 
may  improve  search  times  even  more.  In  particular,  a  promising  combination  seems  to  be  the 
CBOX  bound  followed  by  the  PASUMfsbl  projection-based  bound.  The  CBOX  bound  is  faster 
to  compute,  but  the  PASUMfsbl  bound  makes  stronger  use  of  the  distributions  than  simply  using 
averages.  The  latter  bound  seems  to  be  the  best  of  the  projection-based  bounds  that  we  proposed, 
although  this  may  vary  with  the  database  and  mode  of  query.  More  experimentation  is  needed  to 
see  if  there  is  a  clear  best  bound  or  combination  of  bounds  for  a  majority  of  applications. 

The  other  main  topic  of  this  work  was  computing  the  EMD  under  translation.  The  frameworks 
of  the  proposed  methods  are  still  applicable  when  the  transformation  group  is  not  the  translation 
group.  In  our  methods,  we  must  solve  the  problem  of  finding  the  best  transformation  for  a  given 
flow.  This  problem  reduces  to  problems  with  known  solutions  in  the  translation  case  when  the 
ground  distance  is  the  L\ -distance,  the  Lo-distance.  or  the  L2-distance  squared.  Once  we  can  find 
the  best  transformation  for  a  given  flow,  we  can  still  find  the  global  minimum  by  looping  over 
the  vertices  of  a  convex  polytope,  and  a  local  minimum  (almost  always)  using  our  two  stage  mini¬ 
mization  framework.  Future  work  will  consider  other  types  of  transformations  such  as  Euclidean, 
similarity,  and  affine  transformations. 
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I  A  Lower  Bound  on  the  L2-Norm  in  terms  of  the  Iq-Norm 

Theorem  7 


M|2  >  4=||o||i  VaG  Rd. 

Vd 


Proof  The  inequality  obviously  holds  when  a  =  0,  so  it  suffices  to  show  that 

•  NI2  1 


min 


The  homogeneity  of  all  Lp  norms 


a^o  ||a||i  Vd 


\ca\L  =  \c\  llalL  for  c  e  R 


implies  that 


min 

a^O 


“in  II0II2 

I  |a|  1 1 =1 


If  abs(a)  denotes  the  vector  formed  by  taking  the  absolute  value  of  each  of  the  components  of  a, 
then  ||abs(a)||p  =  ||a||p.Tt  follows  that 


Let 


Then 


min  MaMo 

IWIi=i 


min 

a>0,||o||i  =  l 


IMI2. 


d 


f(a)  =  '$2al 
k= 1 


and 


9(a)  = 


-  1. 


min 

o>0,  ||a||i=l 


(min 

g(a)= 0 


According  to  the  theory  of  Lagrange  multipliers,  we  must  have 


(V/)(a*)  =  A((Vp)(a*))  for  some  A  6  R. 
2a*  =  A1 


at  an  extremum  location  a*,  where  1  denotes  a  vector  of  d  ones.  Solving  for  a*  gives  a|l  =  A/2 
for  k  —  1  Solving  for  A  in  the  constraint  g(a*)  —  0  gives  A  =  2/d.  Hence  a*k  =  1/d 

for  k  —  1  and  /(a*)  =  1/d.  Obviously,  there  is  no  maximum  value  for  the  homogeneous 

function  ||a||2  on  the  plane  g(a)  =  0.  Therefore, 

min  /(a)  =  /(a*)  =  1 

g(a)-0  a 

Taking  the  square  root  of  both  sides  completes  the  proof.  ■ 
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